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Abstract: In studies involving lifetimes, observed survival times are frequently 
censored and possibly subject to biased sampling. In this paper, we model 
survival times under biased sampling (a.k.a. , biased survival data) by a semi- 
parametric model, in which the selection function w{t) (that leads to the biased 
sampling) is specified up to an unknown finite dimensional parameter 8, while 
the density function f(t) of the survival times is assumed only to be smooth. 
Under this model, two estimators are derived to estimate the density function 
/, and a pseudo maximum likelihood estimation procedure is developed to 
estimate 6. The identifiability of the estimation problem is discussed and the 
performance of the new estimators is illustrated via both simulation studies 
and a real data application. 



1. Introduction 

The problem of analyzing survival data arises in many application fields, such as 
clinical trials in medicine, reliability assessments in engineering, biology, epidemiol- 
ogy and public health. Censoring is a common phenomenon accompanying survival 
data, often due to voluntary or involuntary drop-out of study subjects. In addi- 
tion, survival data may have been drawn by biased sampling (with or without our 
knowledge) in which whether a survival time T can be observed depends on a se- 
lection function w(t) which is the probability of observing T if the true value of T 
is t. Survival data drawn under such biased sampling, when w(t) is not a constant, 
are hereafter called biased survival data/sample. When w{t) is a constant, the sur- 
vival data are called the standard survival data. Here are three examples of biased 
survival data, with the first given in more detail. 

1. In a study of Scleroderma, a rare disease, some data of all cases of Scle- 
roderma diagnosed in Michigan from 1980 to 1991 were collected and the 
times from diagnosis of Scleroderma to death were recorded 0] . Based on the 
Kaplan-Meier (K-M) estimates of survival curves for patients diagnosed in 
1980-1985 versus 1986-1991, Gillespie (one of the authors in [3]) found that 
the earlier group patients (from 1980-1985) lived significantly longer than 
the later group patients (from 1986-1991). What had happened? If anything 
had changed, medical care should have improved in 1986-1991 over 1980- 
1985 and hence the second group of patients should have had better survival 
times. According to Gillespie, their sources of information included hospital 
databases and responses from private physicians. Unfortunately, because hos- 
pital records did not always go back to 1980, and physicians did not always 
remember patients they saw many years ago, patients who were still alive (and 
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thus had more current hospital records) were more hkely to be collected in 
the sample than those who died in the early period. This resulted in a biased 
survival sample for the 1980-1985 group. Indeed, as Gillespie stated "We feel 
that the result is entirely due to our length-biased sample." Length-biased 
sampling is a special example of biased sampling with w{t) cx t. 

2. In assessing familial risk of disease based on a reference database, which is 
a collection of family histories of cases typically assembled as a result of one 
family member being diagnosed with a disease. Clearly, the larger a family is, 
the greater the probability that this family will be found from the registry is 

3. In cancer screening programs, whether the pathological changes of a patient 
in the preclinical phase can be discovered depends very much on the phase of 
the tumor. 

Let f{t) be the true probability density function (pdf) of the survival time T and 
F{t) the corresponding cumulative distribution function (cdf). If the sampling bias 
in a biased survival sample is ignored in an estimation of / or F , resulting estimates 
are not consistent and can be misleading as shown in Example 1. In fact, the 
missingness resulting from biased sampling is sometimes also called "non-ignorable" 
missing because it leads to an observed sample that has a density weighted by w, 
as demonstrated in ([1]) below, in contrast to missing at random, MAR, in which 
whether a subject is missing is independent of t and may be ignored. 

In this paper, we propose a semi-parametric model that incorporates both the 
censoring information and biased sampling scheme for modeling biased survival data 
(Section[2l). In our model, the density function f{t) of the survival times is assumed 
to be smooth, and the selection function w{t) is specified up to an unknown finite 
dimensional parameter and is a constant when = 9q; for example, w{t) oc , 
for 9 > 0. So this model is applicable to both biased survival data and standard 
survival data. The identifiability of estimating (/, w) is also discussed. The semi- 
parametric parameter (/, 9) under our semi-parametric model is "sieve" identifiable. 
In Section [HI two estimators, one called weighted kernel estimator (WKE) and the 
other called transformation-based estimator (TBE), are derived for estimating /; 
and a "pseudo" maximum likelihood procedure is proposed for estimating 9. Our 
new estimators are compared with those that ignore either censoring or sampling 
biases or assume that the selection function is known; and examined as the sample 
size increases (Section SJ. The Li, L2 distances and MSE of our estimators / con- 
verge, while the naive estimator (that ignore both censoring and selection biases), 
the K-M estimator and the Jones estimator Q did not perform as well as ours. In 
terms of a confidence interval for F{t), our WKE and TBE also beat the naive, the 
K-M and Jones estimators. The application of the new estimators is illustrated via 
an analysis of a survival data set on time until death of bone-marrow transplant 
patients in Section [5l The paper concludes with some discussions in Section [6l 

2. Model 

In an idealized situation, there would be Ti , . . . , T/v *~ / for us to make an inference 
about /. In reality, we observe only a subset of Ti, . . . , Tpf, where each Ti is included 
in the subset with probability w{ti) if the value of Ti is ti. The function w{t) is 
called the selection function. Abusing the notation a little, we denote the subset 
by T = {ti, t2, ■ . . , tn}, then the observed sample size n ^ Binomial (N, k), where 
K = Ef{w{T)) < 00 is the mean value of the probability of observing T. The 
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observed sample T no longer has the common pdf /. Instead, conditioning on n, 

(1) t,,...,ur^'ut)^^^^^. 

K 

Thus, if a standard procedure that ignores the selection bias in T is used, the 
resulting density estimate based on t^'s might be consistent to f^, but not to /, the 
density of interest. A new procedure that accounts for both censoring and selection 
biases must be developed to delineate / and w. 

Identifiability. If both f{t) and w{t) in ([!]) are completely unknown, the problem 
of estimating both / and w based on T, one biased sample, is unidentifiable. For 
any wit) and an arbitrary h{t) > for which the integral k' = J f{u)/h(u)du is 
finite, the pair {w{t), f{t)) gives the same likelihood as the pair {h{t)w{t), f{t)/{h{t)- 
k')). To ensure the identifiability, (a) either ■w{t) or /(<) has to be assumed to be 
parametric; or (b) there is another biased sample T' which has some overlap with 
T. The overlap T H T' provides some information about w and hence allows both 
/ and w to be estimable nonparametrically in the range of T n T'. See Lloyd and 
Jones 12], Wang and Sun [25|, for more information on nonparametric estimates 
of / and w, based on two biased samples. 

In this paper, we consider the case when there is only one biased survival sample, 
which is T with some of its members censored. Here iV, the size of the idealized 
sample is not assumed to be known. In other words, our first model assumption is: 

Model Assumption 2.1. The observable sample is: (T, /) = {{ti, li), i = 1, . . . , n}, 

where ti — ti ^ if ti is uncensored (/^ = 1), and ti — Ci, a censoring time, if 
ti is censored {li — 0). The censoring times q's are independent of survival times 
tj's and have a common censoring distribution that has the same support as that 
of fw Further, a right censoring scheme is assumed: li — 1 ii ti < Ci, otherwise. 

To ease the notation, we abuse the notation again to simplify ti by ti hereafter. 
So, we observe {(fi,/^)}, where if /; = 1, the actual survival time is ti f^, 
uncensored, if li = 0, the survival time is censored and is greater than ti. 

Next, given only one biased survival sample, we shall assume either w or / to 
be a parametric function. If both / and w are parametric functions, estimating / 
and w is equivalent to estimating parameters in a parametric survival model. For 
example, let 

/(t) cx7at"-ie"''*°, and w{t) oc t'^ , 
then the weighted density is 

k' 

The unknown parameters, a,P,"f, can be estimated by maximizing the likelihood: 

(2) L^l[[uu)Y'[SMV-'% 

i 

where Sw{ti) = 1 — Jq' fw{u)du is the survival function at ti. The expression of the 
resulting mle from Q may be complicated but the estimates are straightforward to 
compute. Hence, as long as the parameters are identifiable from ([5]), the parameters 
can be estimated using standard parametric estimation procedures^ 

^In the case where both f{t) and w{t) axe specified up to an unknown finite dimensional 
parameter, typical identifiability conditions for a parametric model are still needed (though stan- 
dard) to estimate the parameters consistently. For example, if /(t) cx: e"*,io(t) oc e"''*, then 
fw{t) oc e''^"''''. Clearly, only a — /3 is identifiable based on one biased sample. 
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The estimation problem becomes more interesting and challenging when f{t) is 
assumed to be smooth while 'w{t) is a parametric function. This semi-parametric 
model is more general than a parametric model and is useful when there is no 
obvious choice for a parametric assumption on /. So, we assume next: 

Model Assumption 2.2. For the biased survival data in Model Assumption 2.1, 
the pdf / is assumed to be smooth and the selection function w{t), denoted hereafter 
by w{t, 6), is specified up to an unknown finite dimensional parameter 9 (z Q. Hence 
the weighted density is now 

where k{0) — Ef{w{T, 9)) with the expectation taken for T ^ f. 

The semi-parametric model specified by ([3]) is related to those considered by 
Gill et al Robbins and Zhang 14 1, and Vardi and Zhang [2^, among others. 
A notable difference is that in our semi-parametric model (which satisfies Model 
Assumptions 2.1 and 2.2), / is assumed smooth so that nonparametric smoothing 
techniques can be used in estimating / and a sieve estimate of 9 based on a pseudo 
likelihood can be developed as that in Section [3l 

Let 3(9) = {t : w{t,9) > 0} be the support of w{t,9). If S{9) depends on 9 in 
that both sets S{9) — S{9') and S{9') — S{9) have positive measure under F for all 
9 ^ 9' G Q, then both 9 and / are completely identifiable as shown by Lemma 2.3 
of Gilbert et al. In practice, the selection function may be a polynomial function 
in t, e.g. w{t,9) (xt^,9> 0, and w{t,0) is a constant. This w has support (0, cx)), 
which is independent of 9. So it does not satisfy the condition of this Lemma 2.3. 
However, we can put a constraint on the form of w{t,9) and the type of f{t), for 
each fixed sample size. Then the resulting semi-parametric estimator, under a sieve 
identifiability defined in §3, will be similar to those obtained by "sieve" methods 
and hence will lead to reasonable estimators of 9 and /. See Section 3. 

Alternatively, if one can model / as a parametric function, the assumption for 
w can be relaxed to be nonparametric. In Sun and Woodroofe [l^. / is assumed 
to come from an exponential family of distributions and w is assumed only to be 
monotone. They also developed an iterative MM (maximization-maximization) al- 
gorithm for estimating both w and parameter in / when N in the idealized situation 
is known and unknown (two very different cases). They showed that the MM al- 
gorithm converges to correct (penalized) maximum likelihood estimators and the 
estimators are consistent. This type of semi-parametric model is dual to the semi- 
parametric model proposed above and may be extended to allow for censored ob- 
servations. We do not consider this extension in this paper. For a recent tutorial on 
MM algorithms under other settings, see 



3. Semi-parametric estimators 

In this section, we develop semi-parametric estimators of (/, 9) under Model As- 
sumptions 2.1 and 2.2, and discuss an additional identifiability condition required 
for our estimation procedure. 



3.1. Weighted kernel estimator (WKE) 

The bias due to censoring can be corrected in a standard kernel estimator by weight- 
ing the K-M estimator with the censoring information as proposed by Marron and 
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Padgett The basic idea is as follows: Order the sample (T, /) with respect to 
T and denote it by /[,]), z = 1, . . . ,n}. Then the K-M estimate of the cdf is 

rO, ^ 0<t<i(i), 

(4) ^^-W = 1 - (ot)'" ' < * ^ ^0)' 

I 1, * > *(n)- 

The Marron and Padgett's kernel estimator of f{t) induced by Fkm is then 



P n 
J Kh{t - z)dFkm{z) = - Hi)), 



where Kh{t) — {l/h)K{t/h), if is a symmetric probability density kernel such as 
the A^(0, 1) density function, and Si is the size of the jump of Fkm in Q at t^iy 

We can correct the selection bias by replacing the weight function Si with 
Si/w O) . Therefore, a new weighted kernel estimator is proposed, 

(5) fwk{t)^K^k> jr — , 

j=i w(i(,),6') 

where 'k^k is a normalizing constant, such that k^^. = Si/w{t(^i'^, 9)), and ^ is a 
good estimate of 9, such as one described in Section [3?3l If , 9) oc is a known 
length- biased selection function, J^k [t] in ([5]) is reduced to the Jones estimate 8] . 
See the comparisons of f^k with the Jones estimate in Section |4l 



3.2. Transformation based estimator (TBE) 

Another way to correct both the selection and censoring biases is by using the 
transformation-based method of El Barmi and Simonoff [J] to correct for the se- 
lection bias, and using Si from the K-M estimate of the transformed variable to 
account for the censoring bias, simultaneously. 

Let gijj) be the density function of F = W{T) = W(T, 9), where T - and 

W{t,9) = [ w{u,9)du 
Jo 

is the cumulative selection function. For example, if w{t, 9) = c ■ t^ , for a constant 
c > and 6* > 0, then W(t, 9) = c ■ t'^+^/{9 + 1) is monotone in t on [0, oo). The 
cumulative distribution function of Y can be easily shown to be 

(6) G{y)^F^{W~'iy)), 

where W~'^{t) is the inverse function of W{t, 9) for fixed 9 and F^ix) — fw{u)du 
is the cdf of fw Differentiating G(y), we obtain the pdf g(y) = f{W~^{y))/n. Thus, 

f{t)^K-g{W{t,9)). 

Hence, for fixed 9, let (F, /) = {{Y„h),i = where = W{t„9). Or- 

der this sample with respect to Y and denote it by {(Fj-j), /[j]))}. Then the K-M 
estimator of the cdf of Y is 

ro, ^ o<2/<y(i), 

Fkrniy) = 1 - nizl {^y^' , < y < 
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Let Si denote the jump size of this K-M estimate at Y(j-). Then our proposed 
transformation based estimator is 

n 

(7) ftb{t) = Ktb J2 s^Kh{W{t, 9) - 

where Ktt is a normahzing constant such that K^f^^ — J2 SiKh{W{t, 9) — ^(i))- Here 
9 is replaced by a good estimate 9 when 9 is unknown. See next section for an 
estimate of 9. If 9 is known and Si = 1/n for all i, ftb{t) is reduced to the El Barmi 
and Simonoff estimate. 



3.3. Estimator of 9 

If 9 is unknown, we propose to estimate it by maximizing a corresponding "pseudo" 
or "sieve" log-likelihood: 

n n 

(8) U(0)=^/,log[/:„fc(t„0)]+^(l-/j)log[5„fc(ij,0)], or 

n n 

(9) r,fc((?) - ^1 ^^A{t,.o)] + ^(1 - ij)\og[Stb{tj,9)] 

where fwk{t,9) — fwk{t) and ftb{t,9) = ftb{t) are defined in (O and ([7]) with 6* 
replaced by 9, and S^kitj, 9) and Stb{tj, 9) are the survival functions at point tj for 
the two methods respectively, 

S^kit, 9) = l- [ Uk{u, 9)du, Stbit, 9) = l- [ ftb{u, 9)du. 
Jo Jo 

In the rest part of this paper, the following "sieve" identifiability is assumed: 

Model Assumption 3.1 (Sieve identifiability). The semi-parametric model 
with unknown parameters 9 and / is "sieve" identifiable in the following sense: 



lwk{Oi) ^ lwk{02) for a.s. ah e 7?.+ -^=^ 9i 
kbiOi) = kb{92) for a.s. ah t, e 71+ ^ 9^ 



72, 



72, 



where TZ+ is the support of /. For practical purposes, in the one-dimensional case, 
R+ can be taken as (0, a) for some large a > 0. 

This type of identifiability ensures that 9 is identifiable under the sieve likelihood 
(|5]), or ([5]), respectively, and the mle of 9 from the corresponding sieve likelihood 
exists. Call the 9 which maximizes the pseudo mle. Since the sieve likelihood is 
usually a good approximation to the true likelihood as n ^ oo, we expect our WKE 
and TBE / based on the 9 to be consistent. This is very much in the same spirit as 
that of a histogram estimator. A properly chosen histogram estimator is consistent 
to / under some regularity conditions while the fully nonparametric mle of / is a 
useless and inconsistent estimate of /. The fully nonparametric mle places a delta 
function at every data point. The consistency of our WKE and TBE are confirmed 
by Table 1 in Section^ Our final WK and TB estimators of / are fwk{t,9wk) and 
ftbit, 9tb), where 9wk and 9tb are the respective pseudo mle's from the corresponding 
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WK and TB sieve likelihoods ([8]) and ([9]) . See Section [6] for a further discussion on 
the asymptotic justification of our proposed procedures. 

In some extreme cases or when the sample size is not large enough, the optimal 
value of 6 may be located at the edge of the specified range 8 of 0. The penalized 
log-likelihoods of the form 

(10) Cfc = ^ogL^k = ^log fwk{t J, 0) + ^\ogSnjk{tj,0) ~ — , 

3eu jec '^^'^ 

(11) 4 = logLtb - ^^sftbitj.e) + log Stb{t„9) - 

jeu jec '^^^ 

are then considered to overcome this difficulty, where C — {i : Li = 0} and U = {i : 
li = 1}, < a < 1 ma y ap proach zero as n — > oo, which was discussed in details 
by Woodroofe and Sun |26|. This penalized log- likelihood is maximized subject to 
the constraint 

w{t, 0) > e, for all t G R+, e > and sup w{t, 0) = 1. 

t 

Under this constraint, "w(t, 9) cx t" means that w{t, 0) is only proportional to t in 
(e, a — e) e for some e > 0. 

In this study, we take a = cn^'^'^, where c is a constant and its value can be 
chosen by the Jackknife or Cross-validation method. In this paper, we choose c by 
minimizing either of the following expressions, 

(12) CVi = ^ Yi^-^,c - 0.,c)^ + in- 1)\0, - 0.,,)\ 

n ^ — ' 

(13) CV2 = -Yf-^At^), 

n ^ — ^ 

i 

(14) cvs - ^ ^ , V(k;_,,c - Kef + (n - ifiK - Ti.,cA , 

Kc(,t Kc) n J 

where the subscript "— i" means that the ith data point has been omitted and the 
subscript "." denotes an average of *_i's. The CV estimation of c can be compu- 
tationally intensive. For large data sets, the fast Fourier transformation may be 
implemented to speed up the algorithm [17|. 



4. Simulation Studies 
4.I. Setup 

In this simulation study, we consider a WeibuU density with shape parameter j — 2 
and scale parameter A = 1, 

(15) /(t,7,A)=A7i^-iexp(-Ai^). 

The solid line in Figure [T] shows the density curve of / defined in (|15p . 

To show the results of ignoring either sampling or censoring biases in a typical 
density estimate, we draw four samples using the following four designs. The kernel 
density curves of these four samples (without a correction for either selection or 
censoring biases), f{t) = l/{nh) K{{t — ti)/h) for ti £ S, are shown in Figure[Tl 
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true density curve 

est. by a simple random sample 

est. by a censored sample 

est. by a biased sample 

est. by a biased S censored sample 




Fig 1. Kernel estimates of densities for samples with sampling bias and/or censoring. 



• Simple random sample {S — Sq): A simple random sample of size 3000 was 
drawn from /. The density curve of this sample was estimated by using the 
standard kernel method and is shown by the short-dashed curve in Figure [TJ 
It's easy to see this curve is close to the true density curve of / as expected. 

• Sample with censoring bias only {S = Sc): A sample of size 3000 was randomly 
selected from / and 30% of the data points were randomly censored. As shown 
in Figure [U the kernel density curve (dotted curve) of sample Sc shifts to the 
left of the true density curve /. This is also expected as now the sampling 
distribution (of Sc) is different from the target distribution (/). This is typical 
for right-censored survival data. 

• Sample with selection bias only {S — Sfc): A sample of size 3000 was randomly 
chosen from the population. Each of these 3000 elements was observed subject 
to the selection probability w(t, 9) — w{t, 0.5) oc \Jt. This w(i, ff) implies that 
the elements with longer survival times were more likely to be sampled. The 
kernel density estimate of the density curve of sample S;, was computed and 
is shown as the dash-dotted line in Figure [TJ We see that the sample density 
curve shifts to the right. This is also a case that the sampling distribution is 
different from the target distribution. 

• Sample with both censoring and selection biases (Sch): In sample S^, if 30% of 
the data are further randomly censored, we obtain a biased survival sample. 
The density curve of Set is estimated and shown as the long-dashed curve in 
Figure [TJ We find that the selection bias in this case somehow has balanced 
out the left-shift- ness of the density curve of Sc though it is still not as good 
as the estimate based on a simple random sample from /. We can not rely on 
this kind of cancellation. If w(t, S) had decreased with the increase of t, the 
selection bias would make the sample density curve more right-skewed. 

The observed sample sizes n were governed by the selection function and cen- 
soring scheme; they varied from one realization to another. 

The results from these four experiments show that if a sampling distribution is 
different from a target distribution, then the deviation of the sampling distribution 
from the target distribution must be considered in developing a good estimate of 
the target density function, otherwise the resulting estimator is inconsistent. 
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4-2. Estimates based on a biased and censored sample 

Using a biased sample that has some of the data points right-censored, we can 
estimate / and w by WKE and TBE. First, we estimate the unknown parameter 9 
in the selection function w{t, 9) by maximizing the log- likelihoods in ([8]) and ([9]) or 
the penalized log- likelihoods in (fTO|) and (fTT|) . 

Figure [2] shows the pseudo maximum likelihood estimates of the unknown pa- 
rameter 9. By using the WKE, we obtained an estimate oi 9 = 0.42 (plot [2l^a)), 
which is closer to the true 9 = 0.5 than that by using the TBE (plot [^IJb)). We can 
then estimate / by using the estimates in ([5]) and ([7]), by replacing 0's with 6''s. In 
Figure [31 the thick solid line shows the true density curve of /, while the dashed 
line shows the WKE by treating the true parameter 9 as known {9 = 0.5) and the 
thin solid line shows the WKE by using the estimated parameter 9 — 0.42. The 
kernel density curve of the sample is also plotted as the dot-dashed line. From this 
figure, we can see that the three density curves are close and the WKE's based on 
known 9 and 9 are only slightly better. However, this result is based on only one 




Fig 2. Maximum likelihood estimators of d. 




Fig 3. Weighted kernel estimates of f. 
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0.0 0.5 1.0 1.5 2.0 2.5 



Fig 4 . Transformation based estimates of f . 



sample. See the next subsection for a report on the overall performance. 

The TBE's obtained by using the true and estimated 9 are displayed in Figured 
From Figure [4l we can see that by using the true parameter 9, we obtained an 
estimate (dashed line) which is close to the density curve of the sample (biased 
with censoring). While by using the estimated parameter 9 = 0.898, we obtained 
an estimate which is much closer to the true density curve /. The reason for this 
superiority is perhaps that fixing 9 ^ 0.5 may have limited some degrees of freedom 
of the semi-parametric approach ~ data can speak for themselves. The curves of 
TBE's are closer to the true density curve but coarser than those of WKE's. This is 
expected because the TEE corrects for the selection bias and censoring bias exactly 
in the same order as how biased survival data are formed, and the coarseness may 
come from the transformation or the way the window-width is determined. Further 
improvement is possible by applying some smoothing techniques to the TEE. 

Hence, as an estimate of /, TEE estimate is the winner though it is a bit rough, 
but it can be smoothed out one more time. 



4-3. Overall performance of WKE and TBE 

To study the overall performance of the weighted kernel estimator and transforma- 
tion-based estimator, we designed two experiments. The first experiment has the 
following design: 

Step 1: Draw a sample S of size iV = 50 from /, subject to biased sampling 
with a selection function w{t, 9) ~ w{t, 1.0), and with 30% of the data points 
censored. 

Step 2: Based on this sample S, estimate the cdf F{t) by using the WKE, TBE, 
Jones estimate, naive estimate (with which we estimate the density function 
from the biased survival data, but without considering either the selection or 
censoring biases) , denote the results by F^ifce , -Fthe , Fjones , Fnaive respectively. 

Step 3: Repeat step 1 and step 2 for 1000 times. Compute Li, L2 and MSE as 
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defined by 

i i 

MSE^-y^{F{U)-F{U))\ 
n — ' 

i 

where di — (ij+i— ti_i)/2 for i = 2, 3, . . . , n~l and di = ^2—^1, dn — tn—in-i- 
So, Li and L2 above are approximations of the Li and L2 distances (in the 
form of integrals). 

Step 4: Take other values of iV, iV = 100, 200, . . . , and repeat step 1 through 
step 3. 

Table [T] shows the Li, L2 distances, and MSE of / from the true /. Note that since 
the real = 1 in this case, the assumptions used in the Jones estimate are justified 
and the Jones estimate is equivalent to the WKE with a known 9. From this table, 
we see that the Li, L2 distances and MSB's of the WKE, TBE and Jones estimates 
decrease as n increases while those of the naive estimate do not. Also, the Li, L2 
distances and MSB's of the WKB and TBB arc much smaller than those of the 
Jones estimate, which is consistent to the findings in the previous subsection that 
the WKB and TBB of / based on estimated 8 perform better than the ones based 
on known 9. 



Table 1 
Comparisons of estimates 

Population L{ MSE 





size 


mean 


sd 


mean 


sd 


mean 


sd 




tbe 


0.279 


0.126 


0.066 


0.056 


0.037 


0.034 


50 


wke 


0.277 


0.135 


0.064 


0.056 


0.037 


0.034 




jones 


0.345 


0.133 


0.091 


0.064 


0.051 


0.039 




naive 


4.837 


1.677 


1.999 


0.900 


1.080 


0.422 




tbe 


0.239 


0.100 


0.045 


0.037 


0.024 


0.022 


100 


wke 


0.260 


0.126 


0.051 


0.045 


0.028 


0.025 




jones 


0.345 


0.109 


0.079 


0.046 


0.041 


0.026 




naive 


3.023 


0.868 


1.340 


0.495 


0.651 


0.215 




tbe 


0.192 


0.072 


0.029 


0.021 


0.014 


0.012 


200 


wke 


0.220 


0.105 


0.035 


0.030 


0.019 


0.017 




jones 


0.334 


0.083 


0.068 


0.031 


0.035 


0.017 




naive 


1.808 


0.430 


0.882 


0.272 


0.406 


0.111 




tbe 


0.156 


0.045 


0.019 


0.010 


0.009 


0.006 


400 


wke 


0.179 


0.091 


0.024 


0.022 


0.013 


0.012 




jones 


0.328 


0.060 


0.063 


0.021 


0.032 


0.012 




naive 


0.613 


0.178 


0.261 


0.110 


0.170 


0.077 




tbe 


0.140 


0.039 


0.016 


0.008 


0.007 


0.004 


800 


wke 


0.148 


0.073 


0.016 


0.016 


0.009 


0.008 




jones 


0.322 


0.044 


0.058 


0.015 


0.030 


0.008 




naive 


1.109 


0.207 


0.647 


0.176 


0.185 


0.023 




tbe 


0.124 


0.033 


0.012 


0.006 


0.005 


0.003 


1600 


wke 


0.126 


0.060 


0.012 


0.011 


0.006 


0.006 




jones 


0.323 


0.032 


0.058 


0.011 


0.030 


0.006 




naive 


0.838 


0.317 


0.396 


0.218 


0.191 


0.028 




tbe 


0.118 


0.034 


0.011 


0.006 


0.005 


0.003 


3000 


wke 


0.107 


0.048 


0.009 


0.008 


0.004 


0.004 




jones 


0.319 


0.024 


0.056 


0.008 


0.028 


0.005 




naive 


0.855 


0.263 


0.425 


0.177 


0.199 


0.021 
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95% Confidence Bands (WKE) 




0.0 0.5 1.0 1.5 2.0 2.5 



Fig 5. Weighted kernel estimates of f. 



95% Confidence Bands (TBE) 




0.0 0.5 1.0 1.5 2.0 2.5 



Survival Time 

Fig 6. Weighted kernel estimates of f. 

In our second experiment, we take 6 — 0.5 and repeat step 1 through step 3 and 
then compute the 95% pointwise confidence bands (based on the 2.5 and 97.5 per- 
centage points from the repeats for each point) for the TBE, WKE, Jones estimate, 
the Kaplan-Meier estimate and the naive estimate. In this case, the length-biased 
assumption {9 = 1) assumed in the Jones estimate is off from the true 9 = 0.5. 
From Figure [5] and Figure [6l we can easily find that when both selection bias and 
censoring bias exist, only our 95% confidence bands from TBE and WKE cover 
completely the true CDF of the survival times (the solid curve in the middle) . The 
Jones estimate, the Kaplan-Meier estimate, and the naive estimate under-estimated 
F{t) substantially. In Figure [5l we select the constant c by the criteria in and 
in Figure [6l we select the constant c by the criteria in (fTS]) . 

4- 4- Remarks 



A direct plug-in methodology was used to select the bandwidth in our study as that 
used in 3, S 2^ - This algorithm was built into an R package: KernSmooth 2.22 
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For simplicity, we used the Gaussian kernel here. Other kernels such as the 
boundary kernels can also be used to correct the boundary effect for the survival 
data (survival times will never be negative.) 

Some rough knowledge about k or 9 might be used to restrict the range of the 
search for c. Here we restrict c G (1, 20). 

5. An application 

In a data set of bone marrow transplantation, there are a total of 137 patients, who 
were treated at one of four hospitals. The study involves transplants conducted 
at these institutions from March 1, 1984, to June 30, 1989. The maximum follow- 
up was 7 years. There were 42 patients who relapsed and 41 who died while in 
remission. Twenty-six patients had an episode of acute Graft-versus-host disease, 
and 17 patients either relapsed or died in remission without their platelets returning 
to normal levels [lol |. 

Several potential risk factors were measured at the time of transplantation. For 
each disease, patients were grouped into risk categories based on their status at 
the time of transplantation. The categories were as follows: acute lymphoblastic 
leukemia (ALL), acute myelocytic leukemia (AML) low-risk and AML high-risk. 
Here we will focus on the disease-free survival probabilities for ALL, AML low-risk 
and AML high-risk patients. An individual is said to be disease-free at a given time 
after transplant if that individual is alive without the recurrence of leukemia. 

There are 38 patients in group ALL, 54 patients in group AML low-risk and 45 
patient in group AML high-risk. Figure [7] shows the estimates of the cumulative 
distribution function F{t) with the K-M estimator for the three groups. Because 
the largest times in the study are different for these three groups, we find that these 
three estimates end at different points. The figure also suggests that if no sampling 
bias exists in the data, the patients in group AML low-risk have the most favorable 
prognosis (dashed line), while the AML high-risk group has the least favorable 
prognosis (dash-dotted Hue). 

Now we use the new method to estimate from the above data by considering the 
possible selection bias. Here, six different estimates of 6 can be obtained by TBE 
and WKE and by using the three different cross-validation methods in (|12p - p^ . 
Sometimes, we will not get exactly the same value by these six different methods. 
Which one shall we use? Simulation studies were performed for this purpose. We 
first generate data sets similar to the data of the three groups - with similar distribu- 
tions and same proportion of data points censored. Second, these three data sets are 
resampled under selection functions w'{t, 9) with different 6 values {6 = 0.5, 1, 1.5) 
and the sample sizes are taken to be identical to the sample sizes of the data of 
groups ALL, AML low-risk and AML high-risk. Finally, 9's were estimated from 
those simulated samples with the new estimators (six different combinations). The 
above procedure was repeated 100 times for each fixed 6 value. We found that the 
WKE with the CV3 criterion defined in was the winner for all 6 values. For 
different applications, the conclusions may vary. The estimated selection functions 
for these three groups are 



(16) 
(17) 
(18) 



w{t) oc 
w{t) (X 
w{t) oc t' 



.0.45 



ALL; 

AML low risk; 
AML high risk. 



.0.89 



.0.89 
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AML high 




ALL 

AML low 
AML high 



Fig 7. K-M estimate of CDF of survival times. 



— ALL 
AlUIL 
Length 



AML high/iow risks 



Length 



Fig 8. Selection functions for the bone marrow transplantation data. 



These functions are plotted in Figured From Figure [SJ we can see that the biased 
samphng scheme for group ALL is different from those of the other two groups. 
In groups AML lower-risk and AML high-risk, patients with longer survival times 
without the recurrence of leukemia are relatively more likely to be involved in 
studies, 9 = 0.89. The selection functions are close to the selection function of the 
length biasing (dotted straight line, in which 6 = 1). While the group ALL has a 
relatively flatter selection function with larger survival times {9 = 0.45). Without 
considering the selection bias, the actual risks will be under-estimated in all three 
groups. 

By considering the effects caused by the biased sampling, the new cumulative 
failure probabilities for patients in the three groups are computed and are shown in 
Figure [HI From Figure [SI we can find that the risks of the patients in group AML 
high-risk are higher than those of the other two groups. This is consistent with the 
result of Kaplan-Meier estimates. What differs from the K-M estimate is that the 
risk of group AML low-risk is actually as high as that of group ALL at least in the 
early stage. 
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Fig 9. New results for F{t). 

6. Discussion 

Since our estimation procedure also allows for constant selection function, our semi- 
parametric model for biased survival data is more general than the standard model. 
So our estimates can be used as a general-purpose procedure for analyzing survival 
data with a right censoring scheme that may or may not be subject to biased 
sampling. If our estimates are much different from the standard estimates such as 
the K-M estimate that ignores the selection bias, then the MAR assumption or no 
biased sampling assumption may be invalid, and caution must be exercised in using 
the standard estimate for biased survival data, which could be misleading. 

In our simulation experiments, we considered w{t) oc for a one-dimensional 
0. The resulting semi-parametric model is more general than the length-biased or 
area-biased sampling models. The procedure should also work for other parametric 
forms of w and/or for multidimensional 6 as long as Model Assumption 12 . II is valid. 
In practice, which family of w should we use? Some empirical experience may help 
us in choosing such a family. Research on model selection of w is needed. In the 
absent of either of these two aids, we recommend to start from a polynomial family 
of w for some reasonable range of 9. 

We used a kernel density estimate to estimate g in the TBE or in the WKE, 
and the Kaplan-Meier estimate to account for censoring bias. Other nonparamet- 
ric smoothing estimates of density, and nonparametric estimates other than the 
Kaplan-Meier estimate of the survival function can in principle also be used in 
building our new estimates for / and 9. 

A full-fledged asymptotic analysis of our estimators is fairly difficult and is not 
the objective of this paper. However, heuristically, if 9 is known, it is conceivable 
that the TBE and WKE are consistent to /. When 9 is unknown, if the sieve 
likehhood in ([9]) is a smooth function of 9, then the plug-in estimate of / by a good 
estimate 6 can be shown to be consistent. Note that we do not really need 9 to 
be consistent; all we need is that the resulting estimated selection function w{t, 9) 
is consistent to w{t,9) up to a proportional constant a,t t — t^'s. See (1) and the 
expressions of the WKE and the TBE. We conjecture that the plug-in estimate of 
/ by the pseudo mle of 9 is consistent under the sieve identifiability condition. This 
conjecture is supported by the general asymptotic property of sieve estimates (see, 
e.g. Bickel et al. [l|) and is confirmed by simulation results shown in Table 1. 
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